Mar 2nd, 2017 - Science and technology - Neighbourhood Watch

Author Avatar
Xyinkl 3月 25, 2017
Arteficial Intelligence 人工智能

Neighbourhood Watch 街区观察

Mar 2nd, 2017

@(TheEconomist)[英语, 翻译, 经济学人]

enter image description here

Millions of images of public streets offer a cheap, sweeping view of America’s demography

“WOULD it not be of great satisfaction to the king to know, at a designated moment every year, the number of his subjects?” A military engineer by the name of Sébastien le Prestre de Vauban posed this question to Louis XIV in 1686, pitching him the idea of a census. All France’s resources, the wealth and poverty of its towns and the disposition of its nobles would be counted, so that the king could control them better.

1686年,一位名叫Sébastien le Prestre de Vauban的军事工程师向Louis XIV世建议:国王在一年中的某个特定的时间节点,能够知悉他所统治的王国的情况,这多么让人感到愉悦满足。这也成了人口普查的来源。所有的法国资源,无论是城镇中富有或贫穷的人口,抑或是贵族的思想状况都会被统计在内,以便国王更好统治国家。

These days, such surveys are common. But they involve a lot of shoe-leather, and that makes them expensive. America, for instance, spends hundreds of millions of dollars every year on a socioeconomic investigation called the American Community Survey; the results can take half a decade to become available. Now, though, a team of researchers, led by Timnit Gebru of Stanford University in California, have come up with a cheaper, quicker method. Using powerful computers, machine-learning algorithms and mountains of data collected by Google, the team carried out a crude, probabilistic census of America’s cities in just two weeks.

在现今社会,这种调查已经比较常见。但是依然颇为浪费人力,而这同样耗资巨大。举例来说,美国每年在一项叫做“美国社区调查”的社会经济调查上花费数亿美元。而统计的结果则需要超过半个世纪才能被使用。不过,现在来自加利福尼亚的斯坦福大学的Timnit Gebru研究团队已经探索出一种低廉、快速的统计方法。通过对强劲计算机、机器学习算法以及谷歌收集的海量数据,研究团队已经能够在两周内得出一种数据原始的基于概率的美国城市普查。
【involve a lot of shoe-leather】n. Leather from which shoes are made that is worn out through walking.

First, the researchers trained their machine-learning model to recognise the make, model and year of many different types of cars. To do that they used a labelled data set, downloaded from automotive websites like Edmunds and Once the algorithm had learned to identify cars, it was turned loose on 50m images from 200 cities around America, all collected by Google’s Streetview vehicles, which provide imagery for the firm’s mapping applications. Streetview has photographed most of the public streets in America, and in among them the researchers spotted 22m different cars—around 8% of the number on America’s roads.

【recognise the make, model and year of】the make = the brand

The computer classified those cars into one of 2,657 categories it had learned from studying the Edmunds and data. The researchers then took data from the traditional census, and split them in half. One half was fed to the machine-learning algorithm, so it could hunt for correlations between the cars it saw on the roads in those neighbourhoods and such things as income levels, race and voting intentions. Once that was done, the algorithm was tested on the other half of the census data, to see if these correlations held true for neighbourhoods it had never seen before. They did. The sorts of cars you see in an area, in other words, turn out to be a reliable proxy for all sorts of other things, from education levels to political leanings. Seeing more sedans than pickup trucks, for instance, strongly suggests that a neighbourhood tends to vote for the Democrats.


The system has limitations: unlike a census, it generates predictions, not facts, and the more fine-grained those predictions are the less certain they become. The researchers reckon their system is accurate to the level of a precinct, an American political division that contains about 1,000 people. And because those predictions rely on the specific, accurate data generated by traditional surveys, it seems unlikely ever to replace them.


On the other hand, it is much cheaper and much faster. Dr Gebru’s system ran on a couple of hundred processors, a modest amount of hardware by the standards of artificial-intelligence research. It nevertheless managed to crunch through its 50m images in two weeks. A human, even one who could classify all the cars in an image in just ten seconds, would take 15 years to do the same.

【crunch】v. 发出碎裂声;嘎吱嘎吱地咀嚼;嘎喳嘎喳地碾过

The other advantage of the AI approach is that it can be re-run whenever new data become available. As Dr Gebru points out, Streetview is not the only source of information out there. Self-driving cars, assuming they catch on, will use cameras, radar and the like to keep track of their surroundings. They should, therefore, produce even bigger data sets. (Vehicles made by Tesla, an electric-car firm, are capturing such information even now.) Other kinds of data, such as those from Earth-imaging satellites, which Google also uses to refresh its maps, could be fed into the models, too. De Vauban’s “designated moment” could soon become a constantly updated one.

这一人工智能尝试的另一个优点则是,一旦有新的数据可以被使用,它可以被重新调用运行。正像Gebru博士指出的,谷歌街景不是唯一的信息来源。自动驾驶汽车一旦逐渐流行,其使用照相机、雷达或类似方式保存周围环境情况的工作模式将会产生更大的数据集合。(Tesla制造的电动汽车现在已经在采集相关信息)其他类型数据,比如谷歌用来刷新其地图的卫星图像同样能被用来训练模型。De Vauban的“某一时刻”很快就将成为“每时每刻”
【catch on】Become popular